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Abstract 

Background: Massive growth in human mobility has dramatically increased the 
risk and rate of pandemic spread. Macro-level descriptors of the topology of the 
World Airline Network (WAN) explains middle and late stage dynamics of 
pandemic spread mediated by this network, but necessarily regard early stage 
variation as stochastic. We propose that much of early stage variation can be 
explained by appropriately characterizing the local topology surrounding the 
debut location of an outbreak. 

Methods: Based on a model of the WAN derived from public data, we measure 
for each airport the expected force of infection (AEF) which a pandemic 
originating at that airport would generate. We observe, for a subset of world 
airports, the minimum transmission rate at which a disease becomes 
pandemically competent at each airport. We also observe, for a larger subset, the 
time until a pandemically competent outbreak achieves pandemic status given its 
debut location. Observations are generated using a highly sophisticated 
metapopulation reaction-diffusion simulator under a disease model known to well 
replicate the 2009 influenza pandemic. The robustness of the AEF measure to 
model misspecification is examined by degrading the underlying model WAN. 
Results: AEF powerfully explains pandemic risk, showing correlation of 0.90 to 
the transmission level needed to give a disease pandemic competence, and 
correlation of 0.85 to the delay until an outbreak becomes a pandemic. The AEF 
is robust to model misspecification. For 97% of airports, removing 15% of 
airports from the model changes their AEF metric by less than 1%. 

Conclusions: Appropriately summarizing the size, shape, and diversity of an 
airport’s local neighborhood in the WAN accurately explains much of the 
macro-level stochasticity in pandemic outcomes. 
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Background 

The world airline network (WAN) has massively increased the speed and scope 
of human mobility. This boon for humanity has also created an efficient global 
transport network for infectious disease [1, 2]. Pandemics can now occur more easily 
and more quickly than ever before. The accelerating emergence of novel pathogens 
exacerbates the situation [3]. Better understanding of global dispersal dynamics is 
a major challenge of our century [4]. Rapid assessment of an emerging outbreak’s 
dissemination potential is critical to response planning [5]. We do not know where 
the next pandemic threat might emerge. Mexico was not a prime candidate for an 
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influenza outbreak, nor West Africa for Ebola. Preemptively mapping the pandemic 
influence of individual airports could contribute substantially to monitoring and 
response plans. 

While exact relationships between the WAN and pandemic spread are difficult to 
model [2], simulation studies suggests that topological descriptors which describe 
epidemic outcomes on network models also have explanatory power for relation¬ 
ships between the topology of the WAN and pandemic spread [6, 7]. Observational 
studies of influenza [4, 8], malaria [9], and dengue fever [10] support this conclusion. 
Given the topology of a network, the minimal disease transmission rate which al¬ 
lows epidemics is given by the inverse of the spectral radius of a network’s adjacency 
matrix [11], and the typical outcome [12] and time course [13] of an epidemic fol¬ 
low a closed-form solution governed by the degree distribution of the network. The 
WAN’s topological structure is well characterized. It is a small-world, scale-free net¬ 
work with strong community structure, imposed partly by spatial constraints [14]. 
The majority of airports (70%) serve as bridges which connect a densely intercon¬ 
nected core of 73 major transport hubs (2%) to regional population centers and 
peripheral airports (28%) [15]. Nodes which connect communities can be distinct 
from high-degree nodes within communities [16]. Since the WAN is designed to 
optimize passenger flow, the network’s temporal structure has little effect at time 
scales relevant for pandemic spread [17]. 

Topological descriptors of epidemic dynamics, however, can only describe typical 
outcomes. They do not describe the structure of the variation around the typical 
outcome, which is dismissed as stochastic when mentioned at all. Even within the 
constraints of a simple branching process model, empirical estimates of the probabil¬ 
ity of epidemic show substantial variation around the analytically derived solution, 
see Eigure 1. Actual outcomes of emergent infectious diseases are crucially shaped 
by chance events in the early phases of their emergence [18]. Clear understanding 
of how seed location influences global outcomes would substantially improve public 
health planning [5]. 

The development of sophisticated, parameter-rich epidemic simulators provides 
powerful tools for exploring relationships between seed location and epidemic out¬ 
comes [19]. Common frameworks encompass demographic and mobility characteris¬ 
tics via either metapopulation [8, 20] or agent-based assumptions [21]. Careful tun¬ 
ing of these models has produced results which well match the spread of the 2009 
influenza epidemic [19, 22]. Yet the complex interactions between model structure, 
input parameters, and estimation methods makes interpretation of model-based 
results challenging [18], especially when attempting to generalized to future out¬ 
breaks for which epidemic parameters are fundamentally unknowable. If, however, 
two radically different modeling approaches result in such high agreement both with 
each other and with reality [22] then the principal driver of outcomes should be ex¬ 
pressible with a small parameter set [4]. Evidence suggests that simple probabilistic 
models incorporating local incidence, travel rates, and basic transmission parame¬ 
ters are sufficient to predict outcomes of complex metapopulation based simulations 
[23]. 

Recent theoretical work suggests that the apparent stochasticity in the early 
phases of a network-mediated epidemic process can be explained by the expec¬ 
tation of the force of infection of epidemic processes seeded from that node [24]. 
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The aim of this study is to evaluate if this finding generalizes to realistic scenarios 
of WAN-mediated pandemic disease spread. 

Methods 

Defining and measuring ExF on the WAN 

Our model of the WAN is based on the 2014 release of the Open Flights database 
[25]. We selected all airports serviced by regularly scheduled commercial flights, 
resulting in a list of 3,458 airports connected by 68,820 routes served by 171 dif¬ 
ferent aircraft types. We simplify the network by replacing multiple routes between 
airports by a single edge whose weight is the sum of the available seats on all routes 
connecting the two airports, under the assumption that the aircraft type reflects 
the airline’s best judgment of the importance of the route. Aircraft seating capacity 
was estimated based on aircraft descriptions on worldtrading.net and airliners.net, 
using airlinecodes.co.uk to translate the lATA aircraft codes into aircraft type. 

The expected force of a network node is defined as the expectation of the force of 
infection generated by an epidemic process seeded from the node into an otherwise 
fully susceptible network, after two transmission events and no recovery [24]. The 
force of infection in a network is directly proportionate to the current number of 
infected-susceptible edges or in a weighted network, the sum of all such edge weights. 
Its expectation after two transmission is given by the entropy of the distribution of 
this sum over all possible ways the first two transmission could occur. Applied to 
the WAN, 

.7 

AEF{i) = — dj log(d, ) 
i=i 

where AEF(i) is Airport i’s Expected Force, J enumerates all possible ways to 
observe two transmissions seeded from i, dj is the weighted degree of the trans¬ 
mission pattern multiplied by the probability that this pattern is observed given J, 
and dj = is the normalization of dj. We here further normalize AEF 

values to the range [0,100]. 

Epidemic outcomes are generated using the GLEaMvis simulator [8, 20]. 
GLEaMvis integrates real-world global population and mobility data with an indi¬ 
vidual based stochastic mathematical model of the infection dynamics to produce 
realistic simulations of the global spread of infectious diseases. Our basic experi¬ 
mental setup is to simulate the same disease model over a range of seed cities. The 
structure and parameters of the disease models are based on those which match 
the 2009 Influenza pandemic as reported in [8] and validated in [19], specifically, 
a Susceptible-Exposed-Infected-Recoverd (SEIR) model with transmission rate /3 
specified below, latency rate e = 1/1.1 and recovery rate = 1/2.5. Rates are ex¬ 
pressed in units of days. The initial population distribution is 10% of the seed city 
infected and the remainder of the (world) population susceptible. Seasonality effects 
are not included, since their influence varies both by time and geographic latitude, 
masking variability attributable to seed location. GLEaMvis divides the world into 
sixteen regions. An outbreak is declared a pandemic on the day prevalence in at 
least three regions is greater than one per 100,000 inhabitants. The pattern the 
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results is invariant to thresholds in the range [0.1,100] per 100,000 inhabitants and 
to replacing the “three regions” criteria with “100 cities.” Results for each airport 
are reported in terms of the median over 20 runs (the maximum number supported 
by the public GLEaMvis client). If the threshold is not passed after 365 days (the 
maximum length supported by the public GLEaMvis client), we declare that no 
pandemic occurred. 

Defining and measuring epidemic stochasticity 

For an outbreak to become a pandemic, its basic reproductive number Rq must 
surpass the basic epidemic threshold Rq > 1 needed to establish a disease in a 
local population by a sufficient amount to also overcome finite subpopulation size 
effects and diffusion rates to neighboring populations. A branching process approx¬ 
imation suggests that invasion thresholds in metapopulation models depend on the 
outbreak’s Rq value, the variance of the network’s degree distribution, and the 
mobility rate between subpopulations [7]. The GLEaMviz model specifies the last 
two values, reducing invasion thresholds to a function of Rq. However, as shown in 
Figure 1, even a pure branching process shows substantial variability around the 
theoretical probability of achieving a large outbreak. For pandemics mediated by 
the WAN, the question of interest is how the invasion threshold varies for different 
airports. We empirically observe invasion thresholds on the WAN as follows. Ten 
seed airports are selected, one from each decile of the range of AEF values, see Table 
1. The basic reproductive number is defined as Rq = /3//i, the transmission to re¬ 
covery ratio. Keeping ^ fixed, we vary /3 over the range [0.4, 0.5], and observe which 
seeds trigger a pandemic at each value under the simulation framework described 
above. For (3 < 0.4, no simulations reached pandemic status, and for (3 > 0.45, all 
simulations resulted in a pandemic. 

Often, diseases of concern are known to be competent of invading the network. 
Here, the outcome of interest is not if a pandemic occurs, but rather how long until 
an outbreak reaches pandemic status. We measure relationships between AEF and 
time to pandemic status as follows. One hundred world airports were chosen such 
that they evenly cover the range of measured AEF values. The simple SEIR model 
used previously is extended to the full model used by the GLEaMvis group to repli¬ 
cate the 2009 pandemic [8]. This estimates transmission rate as /3 = 0.8383, well 
above the invasion threshold of /3 « 0.45 determined empirically above. Further, the 
infected compartment of the SEIR model is divided into three categories: asymp¬ 
tomatic, symptomatic travelers, symptomatic non-travelers. These categories affect 
the mobility model, and non-symptomatic individuals have reduced transmissibility. 
For each seed location, we observe both the number of days until pandemic status 
is reached and the number of days until peak global incidence. Both outcomes are 
highly correlated, since once pandemic status is achieved further disease develop¬ 
ment is determined by network topology. The purpose of measuring peak global 
incidence is that this measure is unambiguous, while any definition of “first day 
of pandemic status” is somewhat arbitrary. A Shapiro-Wilks test of the observed 
times to peak global incidence suggests that this data is approximately normally dis¬ 
tributed (p = 0.69 under the null hypothesis that the data is normally distributed), 
while the distribution of observations of first day of pandemic status is right-skewed 
(p = 0.04). 
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Relationships between outcomes and AEF are measured by Pearson correlation. 
We additionally test correlations to weighted and unweighted versions of each air¬ 
port’s betweenness, degree, and eigenvalue centralities, and also to Verma et al’s 
t-core, a variant of the k-core which counts triangles [15]. 

Robustness of AEF to sampling error 

The robustness of AEF values is examined by observing their relative change while 
progressively degrading the model WAN from which they are derived. The network 
is degraded by removing from one to 15 percent of U.S. airports from the network 
along with their associated edges. Community-based analysis of the WAN suggest 
that US airports form one large community [15, 16]. The AEF of all remaining world 
airports is computed. Three different random removal schemes were used: uniform 
over all airports, selection weighted by airport degree, selection weighted by AEF. 
The resulting AEF values are compared with the original AEF values. We record 
the number of airports whose degraded AEF departs from its original AEF by more 
than 1% and by more than 5%. Reported results are the average over ten runs, and 
show the amount of degradation for both U.S. and non-U.S. airports. 

Results 

The AEF of the seed location is strongly predictive of an outbreak’s invasive thresh¬ 
old as shown in Figure 2 and Table 1. The correlation between AEF and the minimal 
observed transmission rate at which it first became pandemically competent was 
0.90 (95% confidence interval: 0.98,0.62). Tokyo was a notable outlier, achieving 
pandemic competence earlier than predicted from its AEF value. 

AEF was also strongly correlated with the delay until an outbreak became a 
pandemic. Correlation was 0.84± 5.8 to the day pandemic status was achieved, and 
0.85 ±5.6 to the day of peak global incidence, see Figure 3. AEF is significantly and 
more strongly correlated to either epidemic outcome than any of the comparison 
network centrality measures, see Table 2 and Figure 4. 

The AEF proved robust to incomplete sampling. Degradation was most severe 
when airports were preferentially removed based on degree. Still, only three percent 
of non-U.S. airports showed more than 1% change in their computed ExF values 
when applying this scheme at the highest noise level. Even within the United States, 
only 22% of AEF values changed by more than 5%. See Figure 5. 

Discussion 

In all cases, AEF explains much of the variation in epidemic outcomes, suggesting 
that the early development of a pandemic is not stochastic, but rather strongly 
structured by the local connectivity of the seed location. The ability of the AEF 
to summarize this connectivity contributes substantially to our understanding of 
the role of individual airports in pandemic diffusion. These results are in harmony 
with other recent work claiming that relative arrival times of WAN-mediated pan¬ 
demics are independent of disease-specific parameters [4] and that a simple branch¬ 
ing process model describes early developments as well as complex metapopulation 
simulations [23]. 
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Degradation of the network had, in general, limited effect on airport AEF val¬ 
ues. Wrong information regarding a specific node could, however, produce a mis¬ 
leading AEF value for that airport. Epidemics seeded from airport PBJ (Paama 
Island, Vanuatu) took longer than expected to achieve pandemic status. This air¬ 
port is probably mischaracterized in the Open Flights database, as flights to this 
simple grass strip are not shown on the Vanuatu airlines online booking system 
(http;//www.airvanuatu.com/, last visited 23 March 2015). In the opposite direc¬ 
tion, Narita Airport (NRT, Tokyo, Japan) showed significantly greater pandemic 
risk than predicted by its AEF. This could be due to Japan’s intense population 
density combined with high local mobility, factors captured in the GLEaMvis sim¬ 
ulator but not the Open Flights database. 

Two outliers highlight a structural blind spot of the AEF metric. Epidemics seeded 
from airports ZRJ (Round Lake, Canada) and PVH (Porto Velho, Brazil) took 
longer than expected to achieve pandemic status. ZRJ is part of a small but locally 
dense community of airports serving first nation communities in Canada. This com¬ 
munity has limited connectivity to the rest of the WAN, and ZRJ is three flights 
distant from any airport outside this community (Winnipeg’s James Armstrong 
Richardson Airport YWG, Chicago Midway MDW, Toronto Pearson YYZ). Like¬ 
wise, PVH is two flights from any of Brazil’s international transport hubs. The AEF 
is here derived from an airport’s two-hop neighborhood, meaning for certain air¬ 
ports it is unaware of these network community boundaries. This limitation could 
perhaps be overcome by instead computing AEF based on a three-hop neighbor¬ 
hood. Given, however, that the WAN’s effective diameter is four hops, and the 
general good performance of the AEF, it is not clear that such an extension would 
substantially improve results globally. 

Airport expected force summarizes the size, density, and diversity of each airport’s 
neighborhood in the WAN. It combines features of degree, neighbor degree, and 
betweenness centrality in a statistically coherent manner. Airport degree is not a 
good descriptor of pandemic outcomes, since it does not account for a neighbor’s 
onward connectivity. Guimera et al noted that high degree does not well correlate to 
high centrality [16]. Nor does low degree correlate to an airport’s connection to the 
wider network, as illustrated by comparing Sweden’s Linkoping City Airport (LPI) 
to Alaska’s Huslia Airport (HSL). HSL has four outbound routes which connect to 
other rural Alaskan airports. LPI has only one outbound route, which connects to 
Amsterdam Schipol. Verma et al propose instead characterizing airports based on 
the number of network triangles they take part in, the t-core [15]. Plotting airport t- 
core against epidemic outcomes shows that its ability to explain epidemic outcomes 
is a result of its ability to successfully segment the WAN into core and periphery, 
see Figure 4. Thus t-core and AEF capture complementary aspects of an airport’s 
role in the WAN. 

The applicability of the AEF could be extended by modifying it to allow for 
varying transmissibility at individual airports. Such an extension would allow it 
to express differences in i.e. competent vector species populations or health care 
system readiness at different world locations. Since the AEF is the expectation of 
the force of infection, such an extension merely requires modifying the calculation 
of each transmission pattern’s force of infection along with the probability of that 
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specific pattern occurring. Both criteria can be met by adjusting edge weights in the 
underlying network model, implying that this extension could be implemented using 
the same framework as outlined in the current work. It would also be interesting to 
apply the expected force framework to disease spread through the world shipping 
network, a major transport system for several vector born pathogens along with 
their vector species [1]. The approach could also be tested on more local transmission 
network models, such as contacts in a hospital ward [26] or city-wide mobility data 
acquired from i.e. mobile phones [27, 28]. 

Conclusion 

An outbreak’s debut location is highly influential in its ability to become a pandemic 
threat. The AEF metric succinctly captures this influence, and can help inform 
monitoring and response strategies. 

These investigations pave the way for the development of simple, robust models 
capable of informing preparedness planning and policy directives. 
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Figures 
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Figure 1 Stochastic variation around the probability of a major outbreak under a simple 
branching process model. In a discrete time Reed-Frost branching process with finite population, 
the probability of a major outbreak is the smallest solution to a: = , shown above as the 

solid blue line, where Rq is the base reproductive number of the disease process. The black dots 
show empirically observed probabilities from simulations of the same model. Each dot is the 
observed fraction of major outbreaks out of 100 simulated outbreaks for a given value of Rq. For 
each value of Rq, 100 dots are generated. 
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Figure 2 AEF and invasion threshold. Higher AEF is associated with a lower transmission rate 
needed to trigger pandemics, as well as shorter delay until the outbreak reaches pandemic status. 
Large dots mark the lowest transmission rate for which a pandemic occurred, for each city. Cities 
are listed in order of increasing AEF. See also Table 1. 
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Figure 3 AEF and time to pandemic. Airport expected force has strong correlation to the rate at 
which a simulated epidemic becomes pandemic, an outcome which otherwise appears stochastic, 
as shown in the histograms in the right column. The top panel shows days to peak global 
incidence, the bottom days until the disease achieves pandemic prevelance. Outliers in the top plot 
are PBJ (Paama Island, Vanuatu), ZRJ (Round Lake, Canada), and PVH (Porto Velho, Brazil). 
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Figure 5 Robustness of AEF values to model degradation. For non-US airports, almost all AEF 
values are unaffected (defined as less than l%/5% change) as US airports are removed from the 
model. While many US airports are affected at the 1% level, few show more than 5% change in 
AEF. The network is degraded by removing airports with selection probabilities weighted 
uniformly, by AEF, and by degree. 

a 


Table 1 Seed locations. The following airports were selected as seed locations for testing 
relationships between AEF and invasion risk. The table additionally reports the number of days for an 
outbreak to reach pandemic status ("Pand.”) at the minimal observed transmission rate (^5) for which 
a pandemic occured, along with each airport’s t-core, (un)weighted degree, and (un)weighted 
eigenvalue centralities. 



lATA 

City 

Country 

Pand. 

/3 

AEF 

t-core 

Deg. 

w. Deg. 

Eigen. 

w. Eigen. 

8 

SMK 

St. Michael 

USA 

286 

0.45 

3 

1 

2 

4 

0.00 

0.00 

10 

YFS 

Fort Simpson 

Canada 

287 

0.45 

16 

0 

1 

2 

0.00 

0.00 

3 

GTE 

Groote Eyiandt 

Australia 

300 

0.45 

27 

3 

3 

4 

0.00 

0.00 

7 

SBH 

Gustavia 

France 

346 

0.42 

31 

7 

6 

9 

0.01 

0.00 

6 

PVH 

Porto Velho 

Brazil 

295 

0.45 

40 

4 

8 

19 

0.00 

0.00 

2 

BIS 

Bismarck 

USA 

360 

0.42 

51 

5 

4 

5 

0.01 

0.00 

1 

BES 

Brest 

France 

284 

0.42 

62 

24 

9 

15 

0.05 

0.00 

9 

XRY 

Jerez 

Spain 

294 

0.42 

70 

88 

17 

22 

0.13 

0.02 

5 

NRT 

Tokyo 

Japan 

22 

0.40 

85 

354 

98 

264 

0.36 

0.88 

4 

1ST 

Istanbul 

Turkey 

172 

0.41 

98 

387 

203 

314 

0.74 

0.64 


Table 2 Correlation and 95% confidence interval between a suite of network centrality measures 
(rows) and days until both pandemic status and peak global incidence. The abbreviation “W” 
indicates the weighted form of the measure. 


Measure 

Pandemic 

c.i. 

Peak 

C.i. 

AEF 

-0.84 

±0.06 

-0.85 

±0.06 

t-core 

-0.77 

±0.09 

-0.79 

±0.08 

Degree 

-0.72 

±0.10 

-0.75 

±0.09 

W degree 

-0.69 

±0.11 

-0.75 

±0.09 

Eigenvalue 

-0.70 

±0.11 

-0.74 

±0.09 

W eigenvalue 

-0.66 

±0.12 

-0.69 

±0.11 

Betweenness 

-0.54 

±0.14 

-0.56 

±0.14 

W betweenness 

-0.17 

±0.20 

-0.09 

±0.20 

Clustering coef. 

0.28 

±0.19 

0.24 

±0.19 

W Clust. coef. 

0.27 

±0.19 

0.24 

±0.19 


















